Product Code Database
Example Keywords: robots -grand $9-142
   » » Wiki: Hinge Loss
Tag Wiki 'Hinge Loss'.
Tag

Hinge loss
 (

 C O N T E N T S 
Rank: 100%
Bluestar Bluestar Bluestar Bluestar Blackstar

In , the hinge loss is a used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).

For an intended output and a classifier score , the hinge loss of the prediction is defined as

\ell(y) = \max(0, 1-t \cdot y)

Note that y should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, y = \mathbf{w} \cdot \mathbf{x} + b, where (\mathbf{w},b) are the parameters of the and \mathbf{x} is the input variable(s).

When and have the same sign (meaning predicts the right class) and |y| \ge 1, the hinge loss \ell(y) = 0. When they have opposite signs, \ell(y) increases linearly with , and similarly if |y| < 1, even if it has the same sign (correct prediction, but not by enough margin).

The Hinge loss is not a proper scoring rule.


Extensions
While binary SVMs are commonly extended to multiclass classification in a one-vs.-all or one-vs.-one fashion,
(2025). 9783540263067
it is also possible to extend the hinge loss itself for such an end. Several different variations of multiclass hinge loss have been proposed. For example, Crammer and Singer defined it for a linear classifier as

\ell(y) = \max(0, 1 + \max_{y \ne t} \mathbf{w}_y \mathbf{x} - \mathbf{w}_t \mathbf{x}),

where t is the target label, \mathbf{w}_t and \mathbf{w}_y are the model parameters.

Weston and Watkins provided a similar definition, but with a sum rather than a max:

\ell(y) = \sum_{y \ne t} \max(0, 1 + \mathbf{w}_y \mathbf{x} - \mathbf{w}_t \mathbf{x}).

In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where denotes the SVM's parameters, the SVM's predictions, the joint feature function, and the :

\begin{align}
\ell(\mathbf{y}) & = \max(0, \Delta(\mathbf{y}, \mathbf{t}) + \langle \mathbf{w}, \phi(\mathbf{x}, \mathbf{y}) \rangle - \langle \mathbf{w}, \phi(\mathbf{x}, \mathbf{t}) \rangle) \\
                & = \max(0, \max_{y \in \mathcal{Y}} \left( \Delta(\mathbf{y}, \mathbf{t}) + \langle \mathbf{w}, \phi(\mathbf{x}, \mathbf{y}) \rangle \right) - \langle \mathbf{w}, \phi(\mathbf{x}, \mathbf{t}) \rangle)
     
\end{align}.


Optimization
The hinge loss is a , so many of the usual convex optimizers used in machine learning can work with it. It is not differentiable, but has a subgradient with respect to model parameters of a linear SVM with score function y = \mathbf{w} \cdot \mathbf{x} that is given by

\frac{\partial\ell}{\partial w_i} = \begin{cases}
-t \cdot x_i & \text{if } t \cdot y < 1, \\
0            & \text{otherwise}.
     
\end{cases}

However, since the derivative of the hinge loss at ty = 1 is undefined, versions may be preferred for optimization, such as Rennie and Srebro's

\ell(y) = \begin{cases}
\frac{1}{2} - ty & \text{if} ~~ ty \le 0, \\ \frac{1}{2} (1 - ty)^2 & \text{if} ~~ 0 < ty < 1, \\ 0 & \text{if} ~~ 1 \le ty \end{cases}

or the quadratically smoothed

\ell_\gamma(y) = \begin{cases}
\frac{1}{2\gamma} \max(0, 1 - ty)^2 & \text{if} ~~ ty \ge 1 - \gamma, \\ 1 - \frac{\gamma}{2} - ty & \text{otherwise} \end{cases}

suggested by Zhang. The modified Huber loss L is a special case of this loss function with \gamma = 2, specifically L(t,y) = 4 \ell_2(y).


See also
Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs
1s Time